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Abstract 

The Newton iteration is a popular method for minimising a cost function on 
Euclidean space. Various generalisations to cost functions defined on manifolds 
appear in the literature. In each case, the convergence rate of the generalised 
Newton iteration needed establishing from first principles. The present paper 
presents a framework for generalising iterative methods from Euclidean space to 
manifolds that ensures local convergence rates are preserved. It applies to any 
iterative method computing a coordinate independent property of a function 
(such as a zero or a local minimum). All generalised Newton iterations in the 
literature are believed to be subsumed by this framework. The framework also 
gives new insight into the design of Newton methods in general. 

Keywords: Newton iteration, Newton method, convergence rates, 
optimisation on manifolds, geometric computing 



1. Introduction 

Given a smooth cost function / : K" — > M, the Newton iteration function 
N f : R n ->■ E™ is 

iV>(a;) = x - [H f {x)]- l Vf{x), x e R" (1) 

where V/(a;) and Hf(x) are the gradient and Hessian of /, respectively; Nf does 
not depend on the choice of inner product with respect to which the gradient and 
Hessian are defined. Starting with an initial guess xq £ W 1 , the Newton method 
uses the Newton iteration function to generate the iterates Xk+i = Nf(xk)- 
Under certain conditions [141 ] . this sequence is well-defined and converges to a 
critical point of /, meaning [H ^ exists for all fc, and x — lim^—^oQ x^ 
exists and satisfies V/(x) = 0. 
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Let / : M ~ > R now be a smooth cost function defined on an n-dimensional 
manifold M. Since M locally looks like R™, it is natural to ask how the Newton 
iteration function (JXJ) can be extended to an iteration function Ef : M —¥ M 
such that the iterates Xk+i = Ef(xk) enjoy the same locally quadratic rate of 
convergence as do the Euclidean Newton iterates. 

One approach Q is to endow the manifold M with a metric and define 
Ef by a formula analogous to {]} but with V/ and Hf replaced by the Rie- 
mannian gradient grad/ and Hessian Wf of /, and the straight-line increment 
— [ify (x)] _1 V/(x) replaced by an increment along a geodesic, namely 

E f (p) = Ex Pp (-[HfipT 1 grad/(p)) (2) 

where Exp p is the Riemannian exponential map centred at p. 

While @ is sensible, it is not ideal. Local quadratic convergence of (J2J 
must be proved from first principles; it does not follow immediately from the lo- 
cal quadratic convergence of (fTJ) . More importantly, it was brought to attention 
in [a that there is no compelling reason to give the manifold M a Riemannian 
structure unless perhaps the cost function / itself is somehow related to the Rie- 
mannian geometry. (An example of such a relation is if f(p) = $^i=i d 2 {p,pi), 
where the pi are points on a Riemannian manifold M and •) the induced 
distance function on M. The minimum of / is known as the Karcher mean of 
the m points pi, • • ■ ,Pm- See 0, ID, E3-) Similarly, the Riemannian approach 
f2| does not offer insight into which metric to use if there are two or more 
competing metrics, or what to do if there is no convenient choice of metric. 

By not imposing a Riemannian geometry on the manifold, it may be possible 
to improve the global behaviour of the generalised Newton method, increase the 
local convergence rate, and decrease the computational complexi ty p er iteration. 



For instance, the computational complexity was reduced in |8|, by avoiding 
the need to evaluate Exp p in ([2]), and empirical evidence presented there sug- 
gested faster convergence too. (The rate of convergence remained quadratic but 
the associated constant was apparently decreased.) How to improve the global 
behaviour of a Newton method is touched on in Section POl but falls outside the 
main scope of the present paper. 

On an n-dimensional manifold M, the following class of generalised Newton 
iteration functions was proposed in [8[. For each p G M, choose a parametri- 
sation [i p : R" — > M centred at p, that is, (ip(Q) = p. Given a cost function 
/ : M ~¥ R, define 

E f (p)= M P (AW0)) (3) 

where N denotes the Euclidean Newton iteration function ([1]). The interpre- 
tation is that f o p, p : R™ — > R is a local version of the cost function centred 
at the current point p, hence one step of the Euclidean Newton method can 
be applied to it, with the result mapped back to the manifold. Section [5] gives 
a new and possibly clearer interpretation of it can be obtained from the 
Newton method by applying a different change of coordinates at each step. 

The class of iteration functions ([3]) includes <j2j) as a special case; if /i p = 
Exp p then ([3]) and @ are identical because the Euclidean gradient and Hessian 
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of / o p, p at are precisely the Riemannian gradient and Hessian of / at p, 
respectively [13 1. 

The rest of the paper is organised as follows. Section [5] recalls the defini- 
tion of local rate of convergence of an iteration function in Euclidean space. 
Section [3] reviews the local convergence properties of the Newton iteration and 
derives a necessary and sufficient condition for local quadratic convergence to 
a non-degenerate critical point. Section [4] introduces the family of coordinate 
adapted Newton methods and conjectures all Newton-like algorithms in Eu- 
clidean space belong to this family. Section B~B1 explains how coordinate adap- 
tation can enhance the performance of the classical Newton iteration. Section [5] 
defines the local rate of convergence of iteration functions on manifolds. Sec- 
tion [6] generalises the coordinate adapted Newton method to manifolds. Lo- 
cal quadratic convergence is proved under mild conditions. For convenience of 
speech, these generalised Newton methods on manifolds are described as lifted 
from Euclidean space. Section [7] introduces path-dependent Newton methods. 
This simultaneously generalises and simplifies the theory and practice of lifting 
Newton methods to manifolds. Section [S] discusses techniques for applying the 
proposed framework in practice. The recurring theme is a basic concept called 
re-centring. Section [!|] strips back to the bare essentials the technique used for 
lifting the Newton iteration function to manifolds. This gives insight into the 
workings of Newton methods on manifolds. It also leads to a rudimentary frame- 
work for lifting any (memoryless) iterative method to a manifold. Section 1101 
concludes the paper. 

The convergence results in this paper differ from those of other studies [3, 0] 
in their generality; the generalised Newton method in Section [6] is believed to 
subsume all other Newton-like methods in the literature. Furthermore, the 
convergence results are modest in their aspirations: only asymptotic rates of 
convergence are studied. (Lower bounds on the radius of convergence are not 
studied explicitly, although a careful study of the constants in the bounds would 
provide that information.) This means simpler and more general conditions 
can be found for ensuring convergence. It also helps lay bare the fundamental 
principles involved in lifting Newton methods to manifolds. 



2. Rate of Convergence in Euclidean Space 

For a function / between Euclidean spaces, the following definitions are 
made. The Euclidean norm |j • |j on R™ is used throughout. The norm of the 
second-order derivative D 2 f(x) is ||£> 2 /(a;)|| = sup|| J? || =1 \\D 2 f{x) ■ {r],r])\\. All 
other norms are operator norms. Gradients V/ and Hessians Hf are calculated 
with respect to the Euclidean inner product. The identity operator is denoted 
by / (or sometimes by 1 in the one-dimensional case). The notation B n (x;p) 
and its abbreviation B(x\ p) denote the open ball centred at x £ M™ of radius 
p. Its closure is B(x;p). 

An iteration function N : l n — > K™, which may not be defined on the whole 
of M. n , is said to converge locally to x* £ E" with rate K £ M and constant 
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k £ K if there exists an open set U C R™ containing x* such that N is defined 
on U and 

VxeU, N(x) G U and ||JV(i) < n\\x-x*\\ K . (4) 

If X = 1 then it is further required that k < 1, and convergence is called linear. 
If K 6 (1,2) the convergence is super-linear, and if K = 2 the convergence is 
quadratic. 

Although (|4]) implies N(x*) — x* , the sequence Xfc+i = N(xk) need not 
converge to x* for an arbitrary G U. Nevertheless, define p = kVC 1- *0 if 
K > 1, or p — co if K — 1 and k < 1. Then B(x*; p) n J7 is mapped into itself 
by A whenever p < p. Moreover, € B(x*\ p) f~)U implies Xk x* . 

The focus of this paper is on convergence rates K greater than one. 

A differentiability condition implying (j4j) is recalled, along with its converse. 

Lemma 1. Let N : E™ -> M n &e C K -smooth at x* G M™ /or some mieger A > 2. 
7/ D N(x*) = /or fc = 1, • • • , K — 1 iften N converges locally to x* with rate 
K. 

Lemma 2. Assume N : M" — > M" satisfies ^ for some integer K > 2 and 
point G M n . If N is C K - X at x* then D k N(x*) = for k = 1, • • • , K - 1. 

Proofs follow from Taylor series expansions. 

3. The Newton Iteration on Euclidean Space 

The material in this section motivates subsequent developments in the paper. 

Let a;* be a critical point of /, that is, V/(x*) = 0. The critical point 
is said to be degenerate if Hf(x*) is singular. If x* is degenerate then the 
Newton iteration function is not defined at x* (except perhaps by continuity) 
and convergence to x* need not be quadratic. For example, if f{x) = x 3 then 
the Newton iteration function Nf(x) = \x converges linearly to the degenerate 
critical point at the origin. Even if x* is non-degenerate, quadratic convergence 
is not guaranteed. 

Example 3. Define f(x) = x 2 + \x\ 5 ^ 2 . The origin is a non-degenerate critical 
point. The Newton iteration function is Nf(x) — g ^^| 1/2 and has super-linear 
but not quadratic convergence, despite / being C 2 -smooth. 

Applying Lemma Q] to (flj shows that / being C 4 -smooth is sufficient for Nf 
to converge locally quadratically to a nondegenerate critical point. If / were 
only C 3 then Nf would only be C 1 and Lemma [T] could not be applied. By using 
bounds though, it is established that / being C 3 is sufficient for local quadratic 
convergence to a nondegenerate critical point. 

Theorem 4. Let f : R" — > M be C 2 -smooth. Let x* G M™ be a non- degenerate 
critical point, that is, V/(x*) = and Hf(x*) is invertible. A necessary and 
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sufficient condition for Nf in (QP to be locally quadratically convergent to x* is 
for there to exist rj, S > such that x € B(x*] 5) implies 

|| [H f (x) H f (x*)} (x ~ x*)\\ <r,\\x- x*\\ 2 . (5) 

Proof. Define the second-order Taylor series remainder term 

R(x) = f(x) - f(x*) - hix - x*) T H f (x*)(x - x*). (6) 

Since / is C 2 , so is R. Moreover, 

Vf(x)=H f (x*)(x-x*) + VR(x), (7) 
H f (x) = H f (x*) + H R (x). (8) 

Substitution into ([T]) shows 

N f (x) - x* = x - x* - [Hfix)}- 1 [H f (x*)(x -x*) + WR(x)} (9) 
= [Hf(x)}- 1 [H R (x)(x - x*) - VR(x)] . (10) 

Since H R is continuous, for any e > there exists a p > such that x €E B(x*; p) 
implies: H f (x) is invertible; [H f (x)]~ l < [iJ/O*)] -1 + e; and ||if/(x)|| < 
\\H f (x*)\\+e. 

To prove sufficiency, first observe 

\\N f (x)-x*\\ < II^Ci)]- 1 !^^)^-^)!! + \\vr(x)\\). (ii) 

Choose S, r\ as in the theorem. If x € B(x*\ S) then 

r 1 1 

\\VR(x)\\< -\\H R {x* +t(x-x*))t(x-x*)\\ dt (12) 
Jo * 



< 



t n\\x — x* || 2 dt 



< — 77 \\X — X 

- 2 / ii 



(13) 
(14) 



Choosing e, p as above, if x S B(x*; min{(5, p}) then Nf(x) is well-defined and 



||N>(s) - < ^ {WlHfix*)}-^ + e) \\x - x*f, 

proving local quadratic convergence. 

To prove necessity, first note from (fTO)) that 



\\N f (x) -x*\\> ||ff/(z)|r \\H R (x)(x - x*) - VR(x)\\ . 
Thus, choosing e,p as above, if x £ B(x*; p) then 

\\H R (x)(x - x*) VR(x)\\ < (\\H f (x*)\\ + e) \\N f (x) - x*\\. 



(15) 

(16) 
(17) 
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By hypothesis, Nf converges locally quadratically to x*, hence by shrinking p 
if necessary, there exists a k > such that x £ B(x*\ p) implies 

\\H R {x)(x - x*) - VR(x)\\ < k\\x - x*\\ 2 . (18) 

Define the closed ball C — B(x*;p/2) and the function <fi(x) — ||iJfj(x)(x — 
x*)||||x— x*|| _1 . Setting <fi(x*) = ensures cj) is well-defined and continuous on G. 
Assume to the contrary, for all rj > 0, the scalar h = max l£ c {4>(x) — rj\\x — x*\\} 
satisfies h > 0. For any x g G, 

||V.R(x)|| < / \\H R {x* +t{x-x*)){x-x*)\\dt (19) 
Jo 

= \\x-x*\\[ <j){x* + t(x-x*))dt (20) 
Jo 

<\\x-x*\\ [ h + tr]\\x-x*\\dt (21) 
Jo 

= /i||x- x*|| + — r?||ac — x*|| 2 . (22) 
Let z € C be such that 0(z) — r/||z — x*| = /i. Since z/i* and 

||-Hfi(z)0 - x*) - Vi?(z)|| > 4>{z)\\z - x*|| - - af*|| - -?7||z - x*|| 2 (23) 

= ^||*-z*|| 2 , (24) 
choosing r] > 2k makes (|24p contradict (|18l) . proving the theorem. □ 

Corollary 5. Lei / : M" ->• R be C 3 -smooth and x* € R" a non-degenerate 
critical point. Then Nf in {!]) converges locally quadratically to x* . 

Proof. If / is C 3 then H f (x) - H f (x*) is C\ hence © holds. □ 

Corollary 6. Let f and x* satisfy the conditions in Theorem^ including (5p. 
The perturbed iteration function Ef(x) ~ x — [Hf(x) + G(x)] -1 V/(x) converges 
locally quadratically to x* if there exists a 7 € R suc/i that the operator norm of 
the matrix G(x) satisfies ||G(x)|| < j\\x — x*\\ in a neighbourhood of x* . 

Proof. Observe 

E f (x) - x* = [H f (x) + Gix)}- 1 {H f (x)(x - x*) - V/(x) + G(x)(x - x*)} . 

(25) 

Therefore, 

\\E f (x) - x*|| < \\[H f (x) + G(x)]- 1 !! {||V/(x) - H f {x*){x - x*)\\ + 

\\[Hf{x) H f (x*)](x - x*)\\ + \\G(x)(x - x*)\\}. (26) 

In a sufficiently small neighbourhood of x*, ||[ffy(x) +G(x)] _1 || is bounded 
above by a constant and the three other terms are bounded by a constant times 
\\x — x*|| 2 ; refer to (fT4| and the hypotheses on Hf{x) and G(x). □ 
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Remark 7. The quadratic convergence rate of the Newton method is coordinate 
independent, in the following sense. Assume / : R™ — > R satisfies the conditions 
in Theorem @] about the point x*. If (j> : R" — > R™ is a C 2 -diffeomorphism 
then </)~ 1 (x*) is a non-degenerate critical point of / o <fi, and by Proposition 1251 
condition holds for / o about the point Thus, if Nf converges 

locally quadratically to x* then Nf 0( p converges locally quadratically to </> _1 (x*). 

Remark 8. Convergence proofs for the Newton method include the Newton- 
Kantorovich theorem (applicable for the Newton method on Banach spaces) 
and the Newton-Mysovskikh theorem; see [|, 12 1 and the bibliographic note 12, 
p. 428]. These theorems give sufficient but not necessary conditions, concen- 
trating instead on giving explicitly a distance from x* within which the Newton 
method is guaranteed to converge. The affine invariance of the Newton method 
is exploited in Q to sharpen these classical results. 



4. The Coordinate Adapted Newton Iteration 

A new optimisation method called the coordinate adapted Newton method 
is proposed. The coordinate adaptation feature allows the domain of attraction, 
the convergence rate and the computational complexity to be modified. These 
ideas are generalised to arbitrary iterative algorithms in subsequent sections. 

4-1- Coordinate Adaptation 

Applying a change of coordinates <j> : R™ — > R™ to (fT]) yields the new iteration 
function Ef(x) — 4> o Nf 0( j> ° <p (x). Expedient choices of <f> can increase the 
domain of attraction, decrease the computational complexity per iteration and 
improve the convergence rate. As an extreme example, if 4> is such that / o <f> 
is quadratic then Ef converges in a single iteration. Although Morse's Lemma 
guarantees the existence of such a <fi locally, finding it is generally not practical. 
This motivates using a different change of coordinates at each iteration, namely 
Ef(x) = <j) x o N f ocjj^o <f)~ l (x) . When <fi x varies with x, the convergence properties 
of Ef need not follow from the convergence properties of Nf. Significantly then, 
it is established that under mild conditions, Ef converges locally quadratically 
to non-degenerate critical points of /. 

Coordinate adaptation is defined in terms of a function <fi : R" x R" — > R™, 
alternatively written <f> x {y) = J/), satisfying the condition that, Vx* G R n , 
3a, P, p € R, p > 0, Vx, y € B(x*- 1 p), the following hold: 

PI ||r> 2 <fe(x)|| < «; 

P2 \\My)-y\\ <Ph-4 2 - 

Implicit in PI is the requirement that D 2 (j) x (x) exists, which in turn requires 
the existence of D<fr x (y) for y sufficiently close to x. 

Given such a </>, the coordinate adapted Newton iteration function is 

E f (x) = 4> x o Nfo^x) (27) 
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where Nf is the Newton iteration function ([1}. This agrees with the earlier 
expression for Ef because P2 implies 4> x {x) = x. 

Theorem 9. Let f and x* satisfy the conditions in Theorem^ including ([3]). 
Let <j) satisfy PI and P2, defined above. Then the coordinate adapted Newton 
iteration function Ef, defined in converges locally quadratically to x* . 

Proof. P2 implies 4> x (x) = x and D<f> x {x) — I. Hence 

D(fo<f> x )(x)=Df(x), (28) 
D 2 (f o cf> x )(x) = D 2 f(x) + Df{x)D 2 < j )x {x). (29) 

Let G(x) be the matrix representation of Df(x)D 2 <p x (x). Then G{x) is symmet- 
ric and satisfies £ T G(x)£ = Df{x)D 2 (f> x {x) ■ (£, £) for any £ e R n . If PI holds, it 
can be shown that ||G(x)|| < a||-D/(x)||. Thus, in a neighbourhood of x* , there 
exist constants r, 7 > such that x £ B(x*;r) implies ||G(x)|| < j\\x — x*\\. 
Since Hf 0c f > _ c (x) = HAx) + G(x), it follows from Corollary [5] that, for a possibly 
smaller r > 0, there exists a k > such that \\Nf 0{ f, :c (x) — x*\\ < k\\x — x*\\ 2 
whenever x € B(x*; r). To be able to apply P2, shrink r if necessary to ensure 



< r < p. Then 

\\E f (x) - x*\\ < (N fo ^(x)) - N fo ^(x)\\ + \\N fo4> Jx) ~ x*\ (30) 

< P\\Nfo4, x (x) -x\\ 2 + k\\x -x*\\ 2 (31) 

< P (l|JV/o^(a) + \\x - x*\\) 2 +k\\x- x*\\ 2 (32) 

< ((3{kt+1) 2 + k)\\x-x*\\ 2 (33) 

whenever x £ B{x*;r), proving the theorem. □ 



As now explained, PI and P2 are not only mild, it is conjectured they 
cannot be weakened. For (l27t to be defined, the Hessian of / o <f> x at a; must 
exist, necessitating the existence of D 2 <fi x (x) implicit in PI. The local bound in 
PI ensures Nf 0( j, x (x) converges locally quadratically. A side-effect of P2 is that 
4> x (x) — x and Dcf> x (x) = I; this loses no generality because the Newton iteration 
function is invariant to afhnc changes of coordinates. The main purpose of P2 is 
to prevent the residual term R x {y) — <t>x(y) — V — \D 2 (f) x (x) ■ (y — x, y — x) from 
being unbounded locally, ensuring that if E(x) is an arbitrary iteration function 
converging locally quadratically to x* then 4> x o E(x) continues to converge 
locally quadratically to x*. The situation in which, for a sufficiently large class 
of cost functions /, Nf a ^ x [x) fails to have local quadratic convergence yet Ef 
has local quadratic convergence is conjectured to be impossible. The claim that 
PI and P2 are mild comes from the result in Section WM that any C 2 -smooth 
<p with <p x {x) = x and D(f> x (x) — I, satisfies PI and P2. Furthermore, neither 
4>(x,y) or D<p x (y) need be continuous except on the diagonal y = x, and 4> x 
need not be locally C 1 -smooth. 

The following example shows that if arbitrary changes of coordinates are 
allowed then the coordinate adapted Newton method may not even be defined, 
much less converge at a quadratic rate. 
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Example 10. Let j3 be an arbitrary scalar. Consider the coordinate adapted 
Newton iteration function applied to f(x) — x 2 using 4> x {y) = y + ~(y — x) 2 
when x ^ and (/> x (y) — y when x = 0. If x ^ then Nfo^ (x) — jj^g x which 

is not defined if /3 = — \. If /3 ^ — \ then Ef(x) = j^ffl 3 '! which in general 
exhibits at best linear convergence. 

4-2. The Generalised Coordinate Adapted Newton Iteration 

The proof of Theorem [S] can be modified trivially to prove the following 
result. A new function ip analogous to <f> is introduced and property P2 in 
Section 14.11 is replaced by 

P2' \\My)-y\\ </3\\y-x\\ 2 . 

Theorem 11. Let f and x* satisfy the conditions in Theorem^ including (^\). 
Let (j) : R n x R" — >• E™ satisfy PI in Section \^A\ Assume further that cj) x {x) = x 
and D(j} x (x) = I. Let tp : K™ x R n — >• W l satisfy P2' above; the qualifiers for x 
and y in P2' are the same as for P2. Then the generalised coordinate adapted 
Newton iteration function 

E f (x) = il> x oNf 4, m (x) (34) 

converges locally quadratically to x* . 

Being able to change both and i/j allows greater control over the computa- 
tional complexity, the domain of attraction and the rate of convergence of the 
iteration function as now discussed. 

4-3. Discussion 

Any optimisation algorithm will perform well for some cost functions and 
poorly for others. The choice of coordinate changes <f> x and ip x in (|34l) determines 
which class of cost functions the generalised coordinate adapted Newton method 
will perform well for. The temptation should be resisted, therefore, of thinking 
of the generalised coordinate adapted Newton method (|34| as an approximation 
of any other Newton-like method; it may be inferior for some cost functions but 
will be superior for others. 

In practice, the challenge is to determine suitable coordinate changes to use 
for the class of cost functions at hand. For inherently difficult optimisation 
problems this will not be easy by definition. Nevertheless, thinking in terms of 
coordinate adaptation leads to the following new strategy. 

The Newton method has the property that the closer the cost function is 
to being quadratic, the faster the rate of convergence. This suggests choosing 
4> x in (|27p to make / o <j> x approximately quadratic. For example, assume that 
/ is of the form f(x) = ip(x — z) for some unknown scalar z, where if) has a 
minimum at the origin; minimising / is equivalent to determining the value of 
z. The choice of <f> to make fo<fr (approximately) quadratic depends on z, which 
is unknown. However, if the optimisation algorithm works, x will converge to 
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z, hence we can assume that x approximately equals z and thus choose <p x to 
make / o <f) x (approximately) quadratic under the assumption that z — x. This 
is the rationale for allowing <p x to depend on x. 

Example 12. Consider the family of cost functions f(x; z) = (x— z) 2 +2(x— z) 3 . 
The coordinate adapted Newton iteration function using the coordinate systems 
4>x{y) = V — (y — x) 2 is Ef(x; z) — z — 8(x — z) 3 + • • • . This shows that the 
coordinate adapted Newton method applied to any member of the family of cost 
functions /(x; z) locally converges at a cubic rate to the critical point x* = z. 
(This was arranged by choosing <fi x (y) so that focf> z {x) = (x— z) 2 ~5(x—z) i + - ■ ■ 
has no cubic term.) 

If the domain of attraction is of primary concern, then similar intuition 
suggests choosing <f) x such that, for any / belonging to the given class of cost 
functions, / o <fr x has a relatively large domain of attraction, especially if x is at 
all close to the minimum of /. 

The extra freedom afforded by ip x in (f34f can be used to reduce the compu- 
tational complexity per iteration without compromising the rate of convergence; 
in some cases, an expedient choice of ip x leads to cancellations, so if) x o Nfo^ 
becomes less computationally intensive to evaluate than -/V/ O X on its own. 

The coordinate adapted Newton method is different from variable metric 
methods. Variable metric methods explicitly or implicitly perform a change 
of coordinates and then take a steepest-descent step in the new coordinate 
system. They do not evaluate the Hessian of the cost function but instead, 
build up an approximation Bk to the Hessian from current and past gradient 
information. They take the general form Xk+i = Xk — [.Bfc] _1 V/(xfc); see 
The generalised coordinate adapted Newton method (f34f with ip x (y) = y can 
be written as Ef(x) — x — [Hf(x) + G(x)}^ 1 V/(x); see the proof of Theorem[9j 
This differs from a variable metric method in several ways; Ef makes use of the 
Hessian of / but Bk does not; Ef has no "memory" but Bk is built up over time; 
variable metric methods generally only achieve super-linear convergence whereas 
Ef has quadratic convergence. The philosophy is also different; variable metric 
methods wish for Bk to be as close as possible to the true Hessian, whereas the 
generalised coordinate adapted Newton method intentionally uses a perturbed 
version of the true Hessian to improve the performance of the algorithm. 

4- 4- Local Parametrisations 

The normalisation cf> x (x) — x used in Section 14.11 does not generalise well 
to the manifold setting. Henceforth it will be convenient to work instead with 
h x {y) = <p x (x + y). The corresponding normalisation is /^(O) = x. Properties 
HI to H3 below are the analogues of properties PI and P2 in Section |4~T1 

Consider a function h : R™ x K™ — > M. n which need not be defined everywhere; 
its domain of definition will be clarified shortly. Such a function is said to satisfy 
Hip if x € -6(0; p) implies ^(O) = x and Dh x (0) = I. It satisfies H2 p if there 
exists a constant a e M such that ||Z> 2 /i a; (0)|| < a for every x e B(0;p). It 
satisfies H3 p if there exists a constant (3 £ M such that x,y £ B(0; p) implies 
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\\h x {y) — x — < /3||y|| 2 . If the subscript p is omitted, the existence of an 
appropriate p > is implied. 

For the derivatives to exist, if h satisfies Hl p or H2 p then its domain of 
definition must include a set of the form {(x, y) \ x £ B(0; p), y £ B(0; 5 X ), S x > 
0} where 5 X is a function of x. Such a set need not contain a neighbourhood 
of the origin. For H3 P though, it is required that £>(0; p) x 5(0; p) lies in the 
domain of h. 

Choosing h x (y) = x + y + y 2 if a; is rational and h x (y) = x + y — y 2 if x is 
irrational exemplifies H1-H3 do not imply continuity of h. Conversely, h being 
C 1 -smooth and satisfying HI and H2 need not imply H3. 

Example 13. Let a : M — > R be a C 2 -smooth (or even C°°-smooth) bump 
function satisfying: < a(t) < 1; a(t) = a'(t) = for t (1/2, 1); a(3/4) = 1. 
Let h(x, y) = x+y+x^^a^ / x 2 )y 2 if x > and h(x, y) = x+y otherwise. Then 
differentiation shows that h(x,y) is (^-smooth in (x,y). Furthermore, h x (0) = 
x, Dh x {0) = 1 and D 2 h x (0) = 0. Therefore, HI and H2 are satisfied, but H3 is 
not; if x n with x n > and y n = (3/4)x 2 l then {h Xn (y n ) - x n - y n )y~ 2 -> oo. 

Nevertheless, a corollary of Lemma [14] is that h being C 2 -smooth, or even 
just D 2 h x (y) being continuous in (x, y), suffices for HI to imply H2 and H3. 

Lemma 14. If, for x, y £ B(0; p), D 2 h x (y) is bounded in (x, y) and continuous 
in y (that is, for each x, h x (y) is C 2 -smooth in y) then h satisfying Hl p implies 
it satisfies H2 P and H3 p . 

Proof. Let a = sup^ y ^B(o-p) ll-^ 2 ^z(y)ll! then H2 p is satisfied. Taylor's theorem 
implies h x (y) = x + y + \D 2 h x {x + t(y — x)) ■ (y — x, y — x) for some t £ [0, 1]. 
Thus, H3 P holds with /3 = a/2. □ 

If h(y) = y + £ 3 sin(l/t) then \h(y) — y\ < \y\ 2 whenever \y\ < 1, however, 
D 2 h(0) does not exist. This puts Lemma [T5l into context. 

Lemma 15. Ifh satisfies H3 P then it satisfies Hl p , and if additionally D 2 h x (0) 
exists for x £ -6(0; p) then h satisfies H2 P (with a = 2/3). 

Proof. That H3 P implies Hl p is clear. If D 2 h x (0) exists, it is known that 

Km \\h x (y)-2h x (0) + h x {-y)-D 2 h x (0)-(y,y)\\\\y\\- 2 = 0. (35) 
\\y\Ho 

Thus, for any e > there is a 5 > such that \\h x (y) — 2x + h x (—y) — D 2 h x (0) ■ 
(y,y)\\ < e\\y\\ 2 whenever \\y\\ < S. Then \\D 2 h x (0) ■ (y,y)\\ < e\\y\\ 2 + \\h x (y) - 
x — y\\ + \\h x (— y) — x — {—y)\\ < (e + 2/3)||y|| 2 , proving the result; both sides 
scale as ||y|j 2 and e > was arbitrary. □ 

Lemma [16] asserts that H3 P is preserved under second-order changes to h; 
the straightforward proof is omitted. 

Lemma 16. For some p > 0, assume h satisfies H3 p . If there exists a 7 £ K 
such that h satisfies \\h x (y) — h x (y)\\ < 7||?/|| 2 whenever x,y £ B(0;p) then h 
satisfies H3 P . 
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The following two technical lemmata will be required in subsequent proofs; 
Lemma [171 is well-known. 

Lemma 17. Given g : R n — > R m and 6 > 0, define L = sup x&B f .g\ ||_Dg(z)|| 
and M = swp z£B ^ .^ 5ll^ 2 5( z )ll ■ V 9 * s C -smooth on B(0;S) and L is finite 
then \\g(x) — g{y)\\ < L\\x — y\\, and if g is C 2 -smooth on -8(0; 6) and M is finite 
then \\g(x) - g(y) - Dg(y) ■ (x - y)\\ < M\\x - y\\ 2 and \\Dg(x) - Dg{y)\\ < 
2M\\x — y\\ for x,y € B(0;S). If g is ^-smooth on B(0;S), meaning it is C 1 - 
smooth on an open set U D B(0;S), then L is finite, and M is finite if g is 
C 2 -smooth on B(0;S). 

Lemma 18. Fix a dimension n. Given scalars pi, p%, /3i, G, L, M > 0, there ex- 
ist p, (3 > such that, for any h : B n (0; pi) — > R" satisfying \\h(y) — y\\ < f3i\\y\\ 2 
for y € B(0;pi), and for any g : B n (0; P2) — > R™ that is a C 2 -diffeomorphism 
onto its image and satisfies g(0) = 0, || [Dg(0)] _1 || < G, ||-Dg(0)|| < L and 
^V zeB(0 ., P2) \\\D 2 g{z)\\ < M, it follows that h(y) = (g o h o [^(O)]" 1 )^) is 

defined for y £ B(0; p) and satisfies \\h(y) — y\\ < /3\\y\\ 2 . 

Proof. For brevity, define A = [Dg^O)]^ 1 . By successively shrinking p > as re- 
quired, the following requirements can be met for all y S B(0; p): \\Ay\\ < pG < 
pi; fh(Ay) - Ay\\ < 0i\\Ayf < (3iG 2 \\y\\ 2 ; \\A^h(Ay) - y\\ < f3iLG 2 \\y\\ 2 ; 
\\h(Ay)\\ < (1 + f3ipG)G\\y\\ < p 2 ; \\g(h(Ay)) - A^h{Ay)\\ < M\\h{Ay)f < 
MG 2 (l+p/3iG) 2 ||y|| 2 (LemmaHTD; and finally || h (y)-y\\ < \\h(y)-A- 1 h(Ay)\\ + 
\\A- 1 h(Ay)-y\\ < MG 2 (1 + p/?iG) 2 ||?/|| 2 + f3iLG 2 \\y\\ 2 . Importantly, an appro- 
priate value of p can be determined as a function of the other scalars and does 
not depend on g or h. Similarly, (3 = MG 2 (1 + p(3iG) 2 + fiiLG 2 suffices. □ 

In certain situations, such as in Section 18. 2\ h x is constructed from trans- 
formed versions of a prototype h, as in Lemma [T9l 

Lemma 19. Let h:R n -> R n _ restricted to B(0; pi) satisfy \\h(y) - y\\ < /3\\y\\ 2 
for some (3 G R. Assume D 2 h(0) exists. Define h x (y) = g x o h{[Dg x {0)]~ 1 ■ y) 
where, for each x € B(0;p 3 ) C R™, g x : R" -> R™ restricted to B(Q;p 2 ) is a 
C 2 - diffeomorphism satisfying g x (0) = x, ||D<7 K (0)|| < L and \\[Dg x (Q)]~ 1 \\ < G 
for some G, L G R. Assume M = sup x£B(0 . p3):yeBi0 . p2) ±\\D 2 g x (y)\\ 2 < oo. 
Here, pi,Pi,Pz > 0. Then h satisfies HI, H2 and H3. 

Proof. Since D 2 h(0) exists and g x is C 2 -smooth, D 2 h x (0) exists. By Lemma [T5l 
it suffices to prove h satisfies H3. Fix x 6 B(Q;pz). Choose p and (3 as in 
LemmaDJwith 3 M = g x {z)-x. Then \\h x (y)-x-y\\ = WigoholDgiO)}- 1 )^)- 
y\\ < /S||y|| 2 whenever \\y\\ < p. Therefore h satisfies tH mm {p,p 2 }- ^ 

Properties H1-H3 are preserved under a change of coordinates. 

Lemma 20. Let g : R" — > R™ restricted to B(0;p^) be a C 2 - diffeomorphism 
onto its image, with g(0) = 0. Given a function h : R™ x R n — > R™, define 
h x (y) = g o h g -i^ x - ) (D(g^ 1 )(x) ■ y). If h satisfies HI and H2 then h satisfies HI 
and H2. If h satisfies H3 then h satisfies H3. 
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Proof. Assume first that h satisfies H3 Pl . Choose a p 2 such that < p 2 < 4f 
and B(Q;p 2 ) C g(B(Q; mia{pi, Fix an i £ 5(0; p 2 ). Define %) = 

h g -i/ x \(y) — g~ 1 (x) and g(z) = <3>(z + .g~ 1 (a;)) — x. Note that is well- 

defined for \\y\\ < pi and g(z) is well-defined for ||z|| < p 2 - Then Lemma 1 1 81 is 
applicable, with g replacing g. (By shrinking p 3 if necessary, it can be assumed 
the derivatives of g are uniformly bounded.) In particular, there exist p and ft, 
independent of x, such that \\h x (y)— x— y\\ — || (goho [Dg(0)]~ 1 )(y) — y\\ < ft\\y\\ 2 
whenever y g B(0;p). Therefore h satisfies iL3 m m{p,p 2 }) as required. 

Next, assume h satisfies HI and H2 (but not necessarily H3). It is reasonably 
clear that h x (y) has a sufficiently large domain of definition required for h x (0), 
Dh x (0) and D 2 h x (0) to exist in a neighbourhood of x = 0. Explicit calculations, 
using the chain rule to compute derivatives, verify that h satisfies HI and H2. □ 

It is remarked that the tedious nature of the last few proofs comes from the 
necessity of ensuring the transformed h has a valid domain of definition. This 
is a consequence of the standing assumption that h itself need not be defined 
on the whole of W 1 x R" . This becomes important when coordinate charts on 
manifolds enter the picture. 

5. Rate of Convergence of Iterates on Manifolds 

This brief section motivates then defines the rate of convergence of iterates 
generated by an iteration function on a manifold. 

Convergence rates are not preserved by arbitrary homeomorphisms. A suf- 
ficient condition for rates K > 1 is the following. 

Lemma 21. Let N be an iteration function on R" which converges locally to 
x* with rate K > 1 and constant k. Let U be a neighbourhood of x* and 
<j> : U — > V C R" a bi-Lipschitz homeomorphism about x* , meaning there exist 
positive constants a, ft € R such that 

WxeU, -\\x-x*\\<U(x)-(f)(x*)\\<ft\\x-x*\\. (36) 
a 

Then N — cf> o N o converges locally to <fi(x*) with rate K and constant 
a K ftn. 

Proof. As noted in Section^ since N converges locally to x* , for all sufficiently 
small balls B centred at x* , N is defined on B, and x € B implies N(x) G B 
and ||AT(x) — x*\\ < K \\x — x*\\ K . Choose such a B contained in U. Since (f> is 
a homeomorphism, Y = <fr(B) is a non-empty open subset of V. If y € Y then 
N(y) is well-defined and contained in Y, and \\N(y) - </>(x*)\\ < ft\\N(<t>- x (y))- 
z*|| < ftK\\(fr l {y) - x*\\ K < a K ftn\\y - cj)(x*)\\ K . □ 

A significantly stronger condition is required if K = 1 . One such example is 
the following. 
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Lemma 22. Let N be an iteration function on K n converging locally to x* 
at a linear rate. Let U be a neighbourhood of x* and <f> : U — > V C R n a 
C 1 -diffeomorphism whose differential D(f) at x* is proportional to the identity. 
Then N = if) o N o <\r x converges locally to <j)(x*) at a linear rate. 

Proof. Let 7 e 1 be such that D<p(x*) ■ £ = 7 £ for £ € M". Note 7^0 
because is a diffeomorphism. Since <p(x) — <f>{x*) — ^{x — x*) + r(x) where 
lim x _>. x * ||r(a;)||/||x — x*\\ — 0, by shrinking U to become a sufficiently small 
neighbourhood of x* , it can be arranged for ([36| to hold with f3 = I7I + e and 
01 = \^ \ _ e f° r an Y e > 0. The result follows from Lemma 1211 by choosing e so 
that a/?K < 1, where k < 1 is the constant associated with N. □ 

The above suggests the following definition. An iteration function E : M — > 
M on an n-dimensional manifold M is said to converge locally with rate K > 1 
to p* with respect to the homeomorphism <p : W C M —> V C M", where 
p* G W, if (p o E o (p^ 1 , as an iteration function on ]R n , converges locally with 
rate K to (pip*)- 

If K = 1 or M is only a topological manifold, there is no distinguished 
choice of homeomorphism ip with respect to which convergence can be defined. 
If M is a C 1 -manifold, K > 1 and an iterate converges with respect to one 
coordinate chart <p then Lemma [21] implies it converges with respect to any 
other chart ip. (Proof: If N = ip o E o ip^ 1 converges then, since ip o p^ 1 
is C 1 and hence bi-Lipschitz on a possibly smaller domain, %j) o E o — 
(ip o p^ 1 ) 0(^0^0 p~ r ) o (-0 o (yj -1 )" 1 converges too.) Definition l23l affords a 
coordinate independent definition of rate of convergence. 

Definition 23. An iteration function E : M — > M on a C 1 -diffcrcntiable man- 
ifold converges locally with rate K > 1 to p* € M if there exists a coordinate 
chart y> : W — > V C M n defined on a neighbourhood of p* such that ipo E o p^ 1 
converges locally with rate K to (p(p*) as an iteration function on M. n . 

6. Generalised Newton Methods on Manifolds 

Throughout this section, / : M — > M. will be a C 2 -smooth cost function 
defined on an n-dimensional C 2 -differentiable manifold M. Recall from Section[5] 
that if E : M — > M is an iteration function with local quadratic convergence to 
a point p* G M then p o E o p^ 1 converges locally quadratically to p>{p*) for 
any chart (U,p) with p* G U. Fix (U,p). For cost functions / with a critical 
point in U, the coordinate adapted Newton method of Section|4]can be extended 
to manifolds by seeking an Ef such that p o Ef o p^ 1 is a coordinate adapted 
Newton iteration function for the equivalent cost function fop^ 1 . For functions 
with critical points outside U , in principle a different coordinate chart needs to 
be taken, but as shown presently, it is straightforward to guess an appropriate 
form for Ef globally. 
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Solving ipoEf op 1 (x) =(j) x o 7V(/ O¥ ,-i)o0 x ° <t> x X { x ) yields 

Ef(p) = o 4> v(p) ) o N fo ^_ lo<p ^ o (y-J p) o (^) (p) (37) 
= i> v oNf ^ p o^" 1 ^), ^ p (z) = y^ 1 o v (p)(z). (38) 

The affine invariance of the Newton method allows this to be rewritten as 

E f(p) N f oi, p ^p\p) = i>P N fo$ p (°) where = i>p( z + f(p))- 

Although this defines Ef only locally, an obvious extension is Ef(p) = %f) p o 
Nf jjj p (Q) where, for each p £ M, -0p : R™ — > M is a parametrisation of a neigh- 
bourhood on the n-dimensional manifold M centred at p, that is, ip p (0) = p. 
This extension is justified by the proof of Theorem [24] in which it is shown that 
<po Ef o p~ x does indeed take the form of a coordinate adapted Newton method 
for any chart ip on M. 

Although tempting to generalise <p and ip in Section|3]to maps from M x K™ to 
M, the global geometry of M can prevent any such map from being smooth. The 
tangent bundle TM, being equivalent to M x R n locally, offers an alternative. 
As TM twists in the "right" way, smooth parametrisations from TM to M can 
be anticipated to exist; see also Section [7J (While smoothness is not essential, 
in practice it may be convenient to work with smooth parametrisations.) 

A general framework for lifting the Newton method to a manifold can now 
be stated. As just explained, it is based on the coordinate adapted Newton 
method introduced in Section |4~T1 

The functions <p : TM M and ip : TM -» M will be required to satisfy 
conditions C1-C2 below, which generalise PI and P2' in Sections 14.11 and 14.21 
Local coordinates are needed. Let it : TM — > M be the projection taking 
a tangent vector v p £ T p M to its base point p. A C 2 -chart (U, ip) induces 
the C x -chart t v : ^{U) -> R" x R" on TM, sending v p to (ip(p), A p (v p )) 

where A p : T p M —> 1" is the linear isomorphism taking v p = 53j=i af~ ^° 

V = iv 1 1 " ' i V n )- The local coordinate representation oi <p is <p = tp o <p o t^ 1 . 

Formally, conditions C1-C2 are satisfied if, Vp £ M, 3 C 2 -chart (U, p) with 
<p(p) = 0, 3p > 0: 

CI (p — p o (j> o t" 1 satishes Hl p and H2 p of Section ET41 

C2 ip = ip o ip o t~ x satisfies H3 P of Section [4T4l 

A generalised Newton iteration function is any Ef : M — > M of the form 

Ef{p)=%oN fo ^%) (39) 

where <f> v : T p M — > M and if> p : T p M — > M are the restrictions of (f> and ip to 
the tangent space T p M at the point p on M. In (|39| . N represents the Newton 
iteration (TTJ) but on the abstract vector space T p M rather than R™. 

The local coordinate representation <p — p o <p o t^ 1 of <p can be written 
as <p(x,y) — <p o 9^-1(3;) o A~_ 1 ,-.(y), alternatively denoted (p x (y)- The local 
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coordinate representation of 4> p is ip o <p p o A p 1 , which can be written in terms 
of </>, namely, 4> V ( P ). Analogously for ip p . 

Theorem 24. Let f : M — > K be a C 2 -smooth cost function on a C 2 -smooth 
manifold M. Let p* £ M be a non-degenerate critical point, that is, Df{p*) = 
and if D 2 f(p*) ■ = then £ = 0. Assume there is a chart (U,ip) on M, 

with <p(p*) = 0, and an rj > such that f = / o ip^ 1 satisfies, for x £ p{U), 

WlHj^-HjmxW^vWxW 2 . (40) 

Let </>, ip : TM — > M satisfy C1-C2, defined above. Then the generalised Newton 
iteration function \39\) converges locally quadratically to p* . 

Proof. Let (U, (p) be as in the theorem. Proposition [211 implies there exists a 
p > such that cp = tp o (f> a r^ 1 satisfies Hl p and H2 p , and ip = cp a ip a t^ 1 
satisfies H3 p . (By Lemma 1151 ip will also satisfy Hl p .) Let h x (y) — y — x, 

4>x(y) = 0(x,y - x) = <p x o h x {y) and ip x (y) = ip(x,y - x) = ip x o h x (y). 
The invariance of the Newton iteration function to the affine coordinate change 
h^ 1 o A tp -if x \ can be used to show 

poEfOip-^x) =i> x oN M Jx). (41) 

The functions <p, ip and / satisfy the necessary conditions locally about the point 
= for the proof of Theorem [TT] to go through. □ 

If / is C 3 -smooth then ((40]) holds. 

6.1. Intrinsic Conditions 

Condition (|40|) does not depend on the choice of coordinates. 

Proposition 25. In Theorem \ 2J\ if |^0| ) holds, it holds with respect to any 
C 2 -chart [U,(p) with <p(p*) = and U sufficiently small. 

Proof. Referring to Theorem [Ml let (U,<p) be a chart with (pip*) = and 
choose p > so that h = <po p^ 1 is well-defined on B(0; p). Then Hf Q p-i(x) = 
Hj oh {x) = A x H^(h(x))A x +G x where A x and G x are the matrix representations 
of Dh and (Df o h)D 2 h respectively. Since Dh and Df o h are C 1 -smooth and 
D 2 h is continuous, there exist constants a,/? such that \\A X — Aq\\ < a\\x\\ and 
ll^xll < f3\\x\\ whenever x £ B(Q;p). Similarly, from ([40]) and Taylor series 
arguments, there exists a constant 7 such that || \H-Ah{x)) — Hj{Q)]Aqx\\ < 
\\[Hf(h(x))-H r (0)}h(x)\\ + \\[H r (h(x))-H r (Q)}(h(x)-A o x)\\ < -f\\x\\ 2 whenever 
x E B(0; p). Shrink U to equal y> _1 (.B(0; p)). The result follows by noting 

\\[H fO f-i(x)-H fo j-i{0)]x\\ < || [A T x Hj(h(x))A x -A T x H ? {h{x))A \x\\ + 
\\[A T x Hj{h{x))A - A^H r (h(x))A Q ]x\\ + \\[A^ H ? (h(x))A - A$ Hj{0)Ao]x\\+ 

\\G x x\\. (42) 

□ 
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Conditions C1-C2 are also intrinsic; the choice of coordinate charts is imma- 
terial and the conditions are preserved under diffcomorphisms. Let h* : TM — ► 
TN denote the push-forward of tangent vectors induced by a map h : M — >• N 
between manifolds; ft,*(« p ) = Dh(p) ■ v p . 

Proposition 26. Let <p,ip : TM — > M satisfy C1-C2. Then about any point 
p £ M, CI and C2 hold with respect to any C 2 -chart (U,ip) with (p(jp) = 0. 
Furthermore, if h : M — > N is a C 2 -diffeomorphism of manifolds then the 
induced maps </> = h o <p o h^ 1 and ip = h o ip o h^ 1 satisfy C1-C2. 

Proof. Let h : M — > N be a C 2 -diffcomorphism. Fix p € M. Let (U, (p) be a 
C 2 -chart on N with <p o h(p) = 0. It will be shown <f> — (p o cf) o t^ 1 satisfies HI 

and H2, and ip = <p o ip o t^ 1 satisfies H3. This proves the second part of the 
lemma. The first part then follows by letting h : M — > M be the identity map. 
Let (U, tp) be a C 2 -chart on M with ip(p) — and such that </> = ip o <fi o r^ 1 

satisfies HI and H2, and -ip = ip o ijj o r" 1 satisfies H3. Let g = ip o h o cp^ 1 ; it 

is a C 2 -diffeomorphism from (p(U D to ip(h(U) n U) and ip(x,y) = g o 

rt kcj- 1 (x)°D(g~ 1 )(x)-y. Apply Lemma[2Ulto conclude ip satisfies H3. Analogously, 

Lemma [2H1 implies <p satisfies HI and H2. □ 

6.2. Sufficient Conditions 

Conditions C1-C2 are readily satisfied by C 2 -smooth parametrisations. In 
this case M must be C 3 -smooth. If M were only C 2 -smooth then <p : TM -> M 
at best can be C 1 -smooth because TM is only a C 1 -manifold. 

Lemma 27. Let M be a C 3 '-manifold. If cp is C 2 '-smooth and, for all p G M, 
0p(Op) = P an d D(pp(0 p ) — I, then CI holds. If ip is C 2 -smooth and, for all 
p £ M, ipp(f)p) = p and Dip p (0 p ) — I, then C2 holds. 

Proof. Follows from Lemma [TJ] □ 

Remark 28. Since C1-C2 are local in nature (Lemma l30l) . it suffices in Lemmal27l 
for ip and ip to be smooth on a neighbourhood of the zero section of TM. 

Conditions C1-C2 are preserved under restriction to submanifolds. 

Lemma 29. Let i : N — >• M be a C 2 -embedding of N in M, with i+ : TN — > 
TM the induced push-forward of tangent vectors. Let <p, ip : TM — ¥ M be 
parametrisations of M satisfying C1-C2, and <f>,ip : TN — > N parametrisations 
of N satisfying <p o = % o <p and ip o — i o ip. Then <p, ip satisfy C1-C2. 

Proof. From Proposition [26l it suffices to assume N C M. Then <p and ip are 
simply the restrictions of <p and ip to TN. The result follows by observing that 
if h : K" x K" -> K™ in Section is restricted to M' Tl C E™ then HI, H2 and 
H3 would continue to hold. □ 

One way to express precisely the local nature of C1-C2 is with the aid of a 
Riemannian metric on M. 
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Lemma 30. Let <p,ip : TM — > M satisfy C1-C2 where M is a C 2 -Riemannian 
manifold. Let r : M — > (0, oo) be a possibly discontinuous function. Assume 
<p,ip : TM — > M satisfy 4>(v p ) = 4>{v p ) andip(v p ) = ip(v p ) whenever \\v p \\ < r(p). 
Then (p satisfies CI. Ifmfp^x f(p) > for any compact K C M then ip satisfies 
C2. 

Proof. Fix p £ M and let ip and p be such that CI and C2 hold. Then B(0; p) x 
B(0;p) is in the image of t v ; let V be its pre-image. For any f > 0, the set 
Vf = {v p £ V | \\v p \\ < f} is open, hence T v (Vf) is open too. 

Choose an x £ B(0;S) and let f — r(ip~ 1 (x)). There exists a S x > such 
that (x, B(0;5 X )) C r v {Vf). Restricted to (x, B(0; 5 X )), ipocpor' 1 and^o^or" 1 
are equal. It follows that <p satisfies CI. 

Let K = (y5 _1 (S(0; p/2)) and f = \-ai peK r(p). Let p £ (0, p/2) be such that 
B(0;p) x B{0;p) C T v (Vf). Restricted to B(0;p) x B(0;p), pi/ior" 1 and 
(p o ip o t" 1 are equal. It follows that ip satisfies C2. □ 

6.3. Embedded Submanifolds of Euclidean Space 

For manifolds embedded in Euclidean space, C1-C2 can be expressed in 
extrinsic coordinates. 

Treating R m as a manifold, a parametrisation <fi : TW n — > R m can be spec- 
ified by its representation (p : R m x R m — > R m with respect to the identity 
chart, denoted cj> = <f> o tj. Given a C 2 -embedding i : M — > R m , let V X M 
for x £ i(M) denote the realisation of Tj-i^M as a subspace of R m , that is, 
(x, V X M) = tj o ^(Tj-i^M) where : TM -> TR m is the push-forward of i. 
(The elements of V X M are the vectors 7'(0) where 7 : (— e, e) — > R m , 7(0) = x, 
is a curve whose image is contained in i(M).) 

If (p(x,y) belongs to i(M) whenever x £ i(M) and y £ 14 M then it induces 
a parametrisation </> : TM — > M given by = o o tj o i*. In essence, </> 
maps a point x 4- y on the affine tangent space of i(M) at x, to the point (p{x, y) 
on i(M). This is how parametrisations were specified in 0. 

Lemma 31. Lei i : M —> R m be a C 2 -embedding of a manifold M . With 
notation as above, assume 0, ip : R m x R m — > R m satisfy: Vz e «(M), 3a, /3, p £ 
R with p > 0, Vx e B(«;p) n i(M), Vy € B(0;p) n F X M, $ x {y)My) £ i(M), 
4(0) =ar, D$ x (0)-y = y, \\D 2 $ X (0) ■ (y,y)\\ < a\\y\\ 2 , \\f x (y) -x-y\\ < (3\\y\\ 2 - 
Then the parametrisations <p , ip : TM — > A/ defined by <j> — i^ 1 o <p o tj o i+ and 
ip = i^ 1 o ip o tj o i+ satisfy CI and C2 of Section® 

Proof. For x £ i(M), let P x : R m — > V X M denote Euclidean projection onto 
V X M. Extend <p by defining <p{x,y) = x + y for x ^ i{M), and <p(x,y) = 
<p(x,P x (y)) + y - P x (y) for x G i(M) and y $ V X M. Extend ip similarly. Then 
<p = <poTj and ip — ip o tj satisfy C1-C2. (Fix p £ R™ and define <f(x) — x — p. 
Note tj o t^ 1 (x, y) = (x +p, y). Hence <p o <p o t~ 1 (x, y) = <p(x + p, y) — p. Same 
for ip. It is readily verified the assumptions in the proposition ensure HI, H2 
and H3 are satisfied.) Hence, from Lemma I2U1 (p and ip satisfy C1-C2. □ 
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7. Whimsical and Path-Dependent Newton Methods 

The generalised Newton method in Section [5] is fully defined by the choice 
of parametrisations <f> and ip, and Lemma [27J shows it essentially suffices to 
choose 4> and ip to be C 2 -smooth to ensure local quadratic convergence to non- 
degenerate critical points. This may appear elegant and straightforward. In 
practice though, especially if M is a quotient space, directly writing down a 
smooth parametrisation cf> may not be the most desirable approach. While nu- 
merous possible choices may come to mind for each <j) p : T p M —> M , difficulties 
arise if no canonical choice is evident for each p that would make (f> smooth. 

One way to understand this is by contemplating assigning parametrisations 
4> p point-by-point on a manifold by starting at a particular point and spreading 
out in all directions. The non-flatness of a manifold may cause these initially 
divergent directions to begin to converge, with some points ultimately reached 
from multiple directions; consider the sphere. Unless special care is taken, the 
parametrisations will not match up at such points. 

This has not been seen before as a problem because implicit or explicit use 
typically has been made of a Riemannian metric to guide the construction of 
paramctrisions. Furthermore, the affine invariance of the Newton method plays 
a critical role as it means parametrisations constructed locally only have to agree 
globally with each other up to affine transformations, which is easier (indeed, 
possible) to achieve. In particular, when thinking of a manifold as an object 
embedded in Euclidean space, it is visually clear how the affine tangent space 
can be moved around the manifold, and although two different paths from p 
to q may move the affine tangent space differently, the only difference will be 
a rotation. Strategies such as projection from the affine tangent space onto 
the manifold, or the use of the Riemannian exponential map, can then be used 
to generate a smooth parametrisation <fi : TM — > M with which the Newton 
method can be lifted from Euclidean space to the manifold M . 

The sphere S 2 highlights the role of affine invariance. Although smooth 
parametrisations <fi : TS 2 — > S 2 are readily constructed, the hairy-ball theorem 
states there is no global section of the frame bundle on S 2 , meaning there is 
no way to identify each T P S 2 with K 2 in a smooth way. If not for the affine 
invariance of the Newton method, it would not be possible to construct smoothly 
varying parametrisations with which to lift the Newton method to S 2 . 

When dealing with abstract manifolds that do not embed canonically in Eu- 
clidean space, it is not necessarily desirable to introduce a Riemannian metric 
to facilitate the construction of a suitable parametrisation. However, without 
a global structure such as a metric to guide the process, the aforementioned 
problem remains of how to construct parametrisations locally that fit together 
globally. A natural solution is to sidestep this issue by allowing Newton meth- 
ods on manifolds to construct parametrisations dynamically as the sequence 
of iterates unfolds. Such methods are called path-dependent Newton meth- 
ods because the parametrisation <f) Pk used at the kih step may depend on the 
path po,pi, ■ ■ ■ ,Pk-i leading up to pk- One of the many possibilities this opens 
up is using (non-metric) affine connections and parallel transport to construct 
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parametrisations. See Section [3] for other possibilities. 

The theory in Section[6]extends to encompass path-dependent Newton meth- 
ods because there is no inherent requirement for the 4> p to vary smoothly, or even 
continuously, in p. The essence of H1-H3 in Section POl is that the bounds a and 
(3 hold uniformly on sufficiently small neighbourhoods. Therefore, a generalised 
Newton method can be constructed by specifying a <j) p for each p with little re- 
gard for how the <p p fit together to form 4>. In other words, a generalised Newton 
method at each step is free to choose from many different parametrisations <j) p 
without affecting its performance. 

It is expedient to study path-dependent Newton methods in terms of whim- 
sical Newton methods. Let S = {E p | p G A/} be a collection of sets E p of 
pairs ((j) p ,ip p ) of parametrisations 4> p ,ip p ■ T p M — > M. A whimsical Newton 
method with respect to E is the general term given to any iterative scheme 
Pk+i = Ej(pk) where Ej\p) is a generalised Newton iteration function 
using a pair of parametrisations (0p,^ p ) belonging to E p . Indexing Ef by k 
means that even if pk+j = Pk for some j > 0, a different parametrisation pair 
can be chosen from T lpk at the fcth and (k + j)th steps. 

Imposing the following uniformity constraints on the elements of E ensures 
any whimsical Newton method with respect to E converges locally quadratically 
to non-degenerate critical points. Recall the definition of A p in Section [51 

Formally, conditions E0-E2 are satisfied if E p is non-empty for all p G M , 
and furthermore, Vp* G M, 3 C 2 -chart (U, (p) with tp(p*) = 0, 3p > 0, 3a, /3 € M, 
\fp £ U, V(0 p ,?Ap) G E p : 

EO <j) p (0 p ) = V-p(Op) = p and D<j) p (0 p ) = D^ p {%) = I; 
El 4> p — ip o 4> p o A" 1 satisfies ||Z3 2 P (O)|| < a; 

E2 i\) p — ip o tp p o A" 1 satisfies \\i> p (y) — <p(p) — y\\ < /?||y|| 2 whenever ||y|| < p. 

Theorem 32. Let f : M — > R be a C 2 -smooth cost function on a C 2 -smooth 
manifold M satisfying J^0[ ) at a non- degenerate critical point p* . If 'E satisfies 
E0-E2 then any whimsical Newton method Pk+i = Ej(pk) with respect to E 
converges locally quadratically to p* . Uniform bounds exist for the rate of con- 
vergence (that is, the constants k and K in and Definition \23\ with respect to 
a given local coordinate chart) that are independent of how Ej(p) selects which 
pair {(j) p ,tp p ) G E p to use at the kth step. 

Theorem [3U can be proved by observing in the proof of Theorem |H] that the 
rate of convergence is determined purely in terms of bounds on the second-order 
behaviour of the parametrisations. Provided the bounds a and /3 remain valid, 
the pairs of parametrisations used become irrelevant. Similarly, it follows from 
the proofs of Proposition!^ and Lemmal2"Ulthat if conditions E0-E2 are satisfied 
with respect to one chart, they are satisfied with respect to any other. 

A path-dependent Newton method differs from a whimsical Newton method 
in that the rule for choosing the parametrisation pair to use at each step may 
depend on previous iterates. 
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Corollary 33. Let f : M — > K be a C 2 -smooth cost function on a C 2 -smooth 
manifold M satisfying at a non- degenerate critical point p* . Let Pk+i = 

e¥° ,Po ' ' Pk ~ 1 \pk) be a path- dependent Newton iterate with respect to an indexed 
family £ = {£ p | p € M} of sets £ p of parametrisation pairs: Pk+i = iplk) ° 
Nfo4> {k) (0 Pk ) where the rule for choosing (V'(fe)) <A(fc)) G E Pfc ma?/ depend on past 
iterates po, - • ■ ,Pk-i as well as on k and pk- If £ satisfies E0-E2 then this 
path- dependent Newton iterate converges locally quadratically to p* . 

Proof. Assume to the contrary the existence of /, p* and a sequence of initial 
points {pq } converging to p* such that the path-dependent Newton iterate 
started at any p^ does not converge quadratically to p*. For each p$ , the 
resulting path-dependent Newton iterate is a whimsical Newton method with 
respect to £. From Theorem l3"2l there exists a neighbourhood of p* such that 
any whimsical Newton iterate with respect to E that starts within this neigh- 
bourhood will converge quadratically to p . That p — > p* contradicts this. □ 

The motivation given earlier for introducing path-dependent methods was 
that it is easier and more natural to construct parametrisations locally then 
extend path-wise than to construct parametrisations globally because the latter 
requires the local parametrisations to fit together globally. Another use for 
path-dependent methods is to give the algorithm memory. This leads into the 
study of general techniques for extending conjugate gradient and other such 
methods to manifolds, a topic outside the scope of the present paper. 

8. Re-Centring and Other Parametrisation Construction Techniques 

Various strategies exist for choosing parametrisations that satisfy C1-C2 in 
Section [BJ If the class of cost functions of interest is known beforehand then this 
knowledge should inform the choice of parametrisation; see Section 14.31 There 
is also interest in choosing relatively simple parametrisations leading to generic 
algorithms designed without regard to any particular class of cost functions. 
A basic idea for how to do this is introduced in Section 18.21 and generalised 
in subsequent sections. It is called re-centring and exploits the existence of a 
local diffeomorphism between any two parts of a manifold. This changes the 
focus from devising parametrisations to devising transformations. If M were a 
sphere, for example, an alternative interpretation is that instead of producing a 
sequence of points pi , P2 , ■ ■ ■ on M converging to a critical point p* , the manifold 
M, along with the cost function /, can be rotated at each step so that first p\ 
is brought to the North pole, then p2, and so forth. Since each Newton step is 
always taken from the North pole, its design is simplified. 

8.1. Submersions and Fibre Bundles 

It may happen that the manifold N for which a parametrisation is sought is 
the image of a smooth function g : M — > N where M is simpler to parametrise. 
For example, M might be a matrix Lie group and N a homogeneous space. 
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Since a cost function / : N — > R pulls back to a cost function / o g on M, an 
iterative scheme on M should induce an iterative scheme on N. Simply pulling 
/ back is not recommended if dimM > dimiV because the final algorithmic 
complexity might increase and non-degenerate critical points of / can become 
degenerate critical points of fog. 

An alternative is to endeavour to "push forwards" the parametrisation on M. 
Let g± : TM -> TN be the induced push-forward of g : M -> N and let p E M 
be such that g is a submersion at p. Then T p M splits into a vertical component 
V p and a non-unique horizontal component H p , that is, there exists a subspace 
Hp C T p M such that T p M = V p ®H p where V p = {v € T p M \ g*(v) = 0}. Since 
g is a submersion at p, g± induces a linear isomorphism from H p to T g ^N, 
denoted <7*|jj . In particular, a parametrisation (f> p : T p M — > M can be used to 
form the parametrisation 4> g ( p ) — g ° 4> P ° (g+lH?)- 1 hom T g{p) N into N. 

Naively, take E g to be the set of all parametrisation pairs ((j> q ,?p q ) coming 
from parametrisations <p p , ij) p : T p M — > M where q — g(p) and g is a submersion 
at p. Provided every point q € N has at least one preimage p such that g is a 
submersion at p, a whimsical or path-dependent Newton method is well-defined; 
see Section [7J It may not converge locally quadratically though. 

Assuming <f> and ip satisfy C1-C2 in Section [6j it is not necessarily the case 
that the resulting S will satisfy E1-E2 in Section [7] There are essentially two 
ways for E1-E2 to fail to hold. Visually, the first is if the angle (with respect 
to a Riemannian metric placed on M) between H p and V p can approach zero 
as p varies. The second is if g^ 1 (q) is unbounded and <j) p for p G <7 _1 (q) gets 
arbitrarily ill-behaved as p goes to infinity. 

If g : M —± N happens to be a compact fibre bundle and an Ehresmann 
connection is chosen, thereby determining the horizontal bundle, then a com- 
pactness argument can be made for E1-E2 to hold. 

A more general approach is to limit the number of parametrisation pairs in 
Ti q by judiciously choosing which preimages p G 9 {ft) to use. For example, 
if there exist a locally finite open cover {Z7 7 } oi N and smooth functions h-y : 
U-y — > M such that each g o /i 7 is the identity map, then E g need only contain 
the finite number of parametrisation pairs coming from the preimages p — h 7 (q) 
for those 7 for which q g t/ 7 , and C1-C2 will imply E1-E2. 

8.2. Re- centring via a Group Action 

If a Lie group G acts transitively on M then a generalised Newton method 
can be devised by continually re-centring the cost function about a distinguished 
point. Precisely, fix p G M, choose parametrisations 4>p,ipp '■ TpM — > M and 
define Ef(p) as in (|39j) . The group action p n- g ■ p allows Ef(p) to be defined 
for p 7^ p by Ef(p) — h(p) ■ Ej(p) where h : M — > G is an arbitrary function 

satisfying h(p) ■ p — p, and f{q) = f(h(p) ■ q) is the re-centred cost function. 

If no obvious rule for choosing h comes to mind, the framework of Section UJ 
can be used. Precisely, define E p = {{4>g,ipg) \ g & G, g ■ p — p} where (ft g — 
9 g o (f>p o Q" 1 , ipg — 0g 'ipp Qg -1 ') 8g(p) = 9 ' P an d Qg is the restriction of 
(6 g ) ± to T p M. Here, (0 9 )* : TM -> TM is the push-forward of 9 g : M -> 
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M. The parametrisation pair ((f> g ,ip g ) yield the same Newton step as before: 
ipg o Nfo<i, g (0g.p) = g ■ Ej(p) where f{q) = f(g ■ q). Assume (j> p and tp p are 
sensible (e.g., C 2 -smooth). Provided the first and second order derivatives of 
Q g are bounded in an appropriate sense then E0-E2 will hold; cf. Lemma [TO1 
This will be the case if G is compact, for example. Otherwise, the number of 
parametrisations in each S p can be limited with the aid of a finite open cover 
of M on which local sections arc defined; cf . end of Section 18.11 

8.3. Re-centring via Affine Transformations 

Re-centring can be applied to a manifold M embedded in Euclidean space 
by using affine transformations of Euclidean space to bring any point p £ M 
of interest to the origin in such a way that the transformed version of M is 
a graph of a function in a neighbourhood of the origin. Therefore, a rule for 
parametrising a graph of a function induces a rule for parametrising M. This 
idea will be used in Section 18.41 The present section focuses on the reverse 
direction: determine if a parametrisation pair (</>, ip) satisfies C1-C2 in Section[6] 
by studying the corresponding re-centred parametrisations. 

Denote by f)s the space of C 2 -smooth functions h : B n (0; S) — > R fc satisfying 
h(0) = 0, Dh(0) — and sup tgBri ( 0;i5 ) ||D 2 /i(i)|| < oo. (Recall B n is an open 
ball in R n .) Associate with any h £ fja the manifold 

%5) = {(*, h(t)) | t £ B n (0; 5)} C R n+k . (43) 

A parametrisation <?Wo) : ^(o.o)^(M) ~^ -^(M) can ^ e represented by a 
function it : R" — > R™ taking x to the point (tt(x), h o ir(x)) on Mr^ gs. Here, 
x represents the point (x, 0) on the affine tangent space at the origin of Mr^gy 
An exemplar is using Euclidean projection from (x, 0) to M^ t s) to define tt, so 
that tt satisfies 

\\(n(x),hon(x))-(xM=™%J(t,h(t))-(x,0)\\ (44) 

whenever the minimum exists. This will be studied in Section [8.41 

It is not important for 7r to be defined uniquely by a rule, nor for tt to be 
defined on the whole of R". Essentially, it is merely required that any choice of 
tt is defined on a sufficiently small domain B n (0; p) and satisfies ||D 2 7r(0)|| < a 
and ||7r(x) — a;|| < /?||x|j 2 for x £ B n (0; p), where the constants p, a and j3 depend 
on h and S in a way that ensures they remain uniformly bounded if h and 6 are 
perturbed; cf. Section [7] This is now made precise. 

A parametrisation <fi : TM — ¥ M of an n-dimensional embedded submanifold 
M C R" +fe is said to satisfy condition D if D1-D6 below are satisfied. Central 
to this condition is a class of functions tt obtained from cp as follows. Associate 
to each x £ M C R" +fc a rotation R x of R" +fc sending V X M to R" x {0}. 
(Here, the notation of Section 16.31 is being used, but with i and i* omitted 
because M is being treated as an actual subset of R rl+fc .) Define the translation 
Q x {q) = q — x. For any pair (h, S) for which R X Q X (M) locally looks like Mr^gy 
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meaning there exists a set U open in R n+k such that R X Q X {M) n U = M(h,s), 
a function 7r : K™ — >• K" can be defined implicitly by 

(7r(y), h o 7r(y)) = Q x i?" 1 Jy) (45) 

where = o t^" 1 is the representation of <j> in local coordinates with respect 
to the identity chart (on W l+k ; refer to Section |OJ} and J : R n -> R"+ fc sends 
y to J(y) = (y, 0). To emphasise, 7r is only defined at points y G K™ for which 
(|4"5"|) holds. Note too that 7T, which depends on the triple x, h and 8, is merely 
<^a;, the restriction of <j> to T X M , written in a canonical form (albeit depending 
on the choice of rotation R x ). 

Condition D requires there to exist functions a, f3, p : (0, oo) x [0, oo) — > K 
such that for all x £ M, 8 > and h £ Sjs for which R X Q X (M) locally looks 
like M (M) and for X = sup teBn(0;<5) \\D 2 h(t)\\ : 

Dl The domain of definition of tt in (|4"5j) includes B n (0; p(8, K)); 

D2 ||£> 2 7r(0)|| < a(6,iT); 

D3 llTT^-yH^^IMpforyGSn^p^JC)). 

It is also required that for all 8 > and all K £ [0, oo): 

D4 sup Ke[o R] a(S,K) < oo; 

D5 sup Ke[o R] (3(8, K) < oo; 

D6 M Ke[0tR] p(8,K)>0. 

A sufficient condition for D4-6 to hold is for a and /3 to be upper semi- 
continuous in K, and p lower semi-continuous in K. 

Proposition 34. Let <f> : TM M be a parametrisation of an n- dimensional 
C 2 -smooth embedded submanifold M C M" + ' c . If 4> satisfies condition D de- 
scribed above then (f> and ip = <fi satisfy C1-C2 in Section® (If 4> satisfies 
Dl, D2, D4 and, instead of D3 and D6, the weaker conditions that 7r(0) = 0. 
Dtt(0) = I and p(S,K) > 0, then <f> satisfies CI. If (j) satisfies Dl, D3, D5 and 
D6 then ip = (j) satisfies C2.) 

Proof. When convenient, elements of M. n+k are written as (u,v) £ K™ x K fc , 
with projections Pi and P2 sending (u, v) to u and v respectively. Fix a point 
z £ M c W l+k . Let 8 > and h £ F) 65 be such that M (hfiS) = R Z Q Z (M) n U 
for some open set U C M n+fc . Let K = sup teB ( 0:6( 5) ||Z? 2 /i(£)||. By shrinking 
8 if necessary, it is assumed without loss of generality that 8 < 1/(4K) and 
\\Dh(t)\\ < 1/4 forte B(0; 68). 

Choose an arbitrary t £ B(Q;8). Define x = Qj 1 i?J 1 (t, h(t)) and /t(r) = 
P 1 R x R z 1 {T,h(t + r) - h(t)) for r G B(-t;68) D B{0;58). It will be shown 
that R X Q X (M) locally looks like M {ht s) where h t (u) = P 2 R x R z 1 (ff 1 (u), h(t + 
f t -\u))-h(t)). 
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First, bounds on f t and its derivatives Df t (r)-£ = P\R X R Z Dh{t + r)-£) 
and A/ t (r) ■ £ 2 = P 1 R x R z 1 (0, D 2 h{t + r) • £ 2 ) are obtained. Importantly, 
P 2 RxR~ 1 k,Dh(t) • £) = because R^^Dh^) ■ f) lies in V X M, the latter 
a consequence of the curve j(9) = Q~ 1 R z 1 (t + d£, h(t + lying on M and 
having y(0) =R- 1 (Z,Dh(t)-£). Therefore, ||A/*(0KH = ||(£,£/i(*K)ll > Hell- 
Similarly, ||/ t (r)|| > ||r|| - \\h(t + t) - fc(t) - ■ r|| > ||r|| - ±iq|r|| 2 (see 
LemmaHZl). For n,r 2 G B(-i;6<S), \\Df t {n) -Df t {r 2 )\\ < \\Dh(t + T 1 )-Dh(t + 
r 2 )|| < |. Thus IRi'/tCr))- 1 !! < 2 because, for > 0, \\Df t (r)^\\ > \\Df t (0)- 
ClI - II (Aft (r) - Aft(0)) • CI! > ilieil- Lemma 1 of 5, Chapter 16] implies f t is 
injective on B(—t; 68), and, since Df t is invertible, the inverse function theorem 
implies f t is a C 2 -diffeomorphism from B(—t; 68) onto its image. Furthermore, 
sincejr|| = 48 implies \\ ft(r) - f t (0)\\ > 46-±K{48) 2 > 28, the proof of Lemma 
2 of [I Chapter 16] implies 5(0; 8) C /t(P(0; 4(5)). Thus, h t {u) is a well-defined 
C 2 -smooth function on 5(0; 8). Finally, note ||A/ t (r)|j < \\D 2 h(t + t)\\ < K. 

Next it is shown that M [huS) = R x R z 1 Q(tM(t))(M(h,es)) D Pf X (B(0; 5)). In- 
deed, if (u,ht(u)) G Mih t ,S) then r = / t _1 (u) is well-defined and (t + r, + 
t)) € M(/i,6«) is such that (u,h t (u)) = R x R z 1 Q(t,h(t))( t + T ' h ( t + T ))- in- 
versely, an arbitrary element p of RxR^ QttMt)){M(hfiS)) IS °f ^ ne f° rm P = 
R x R z 1 (r,h(t +t) - h(t)) for some r G B(-t;65). Since P x (p) = / t (r), if 
p G Pf^fifo;*)) then u = / t (r) G P(0; 5) and p = (u, /»*(«)) G M (ftt , 5 )- Be- 
cause (t,h(tj) = R z Q z (x), it follows that R x R~ 1 Q^ t ^ t yR z Q z = R X Q X . There- 
fore, M {htiS) = R x Q x (M)nR x Q x Q- 1 R- 1 (U)nPr i (B(0;8)), proving R*Q S (M) 
locally looks like M/^ g\. 

To show ht G Sjs, first note /it(0) = and Dh t (0) — 0, the latter a conse- 
quence of PzRxRj 1 ^, Dh(t) ■ £) = for all £. It remains to bound || A/i t (u)|| . 
For an arbitrary u G B(Q;5), let r = / t _1 (u) G B(0;4<S). Then D 2 h t (u) ■ £ 2 = 
P 2 R x R- 1 (D 2 ff 1 (u) -e, D*h(t + r) ■ (Df t ~\u) ■ 2 + Dh(t + r) D^ff^u) ■ 
Now, A/rH") = (Aft(r))" 1 and A/," 1 ^) • ? = -{D f t (r))^ D 2 f t (r) ■ 
((Dft(r)) 1 • £) 2 . Applying the earlier bounds shows ||A/i t (u)| < K where 
K = ^{8K) 2 + (AK + 2K) 2 = 1QK. 

With the values of 8 and K as above, and referring to D4-D6, define a — 
sup A 'e[o,/f] a(S,K), j3 = sup Ke[oR] (3(8, K), and p = mi Ke[QJt] p{8,K). Shrink 
p if necessary to ensure p < 8 and B n +k(0',2p) C U. Thus, for any x G M H 
B n+ k{z;p) there exists ate £?(0;6(5) such that X = Q~ 1 R~ 1 (t,h(tj), in which 
case < ||Pj 1 (t, ft.(t))|| = ||x — z|| < p < <$. In particular then, R X Q X (M) 
locally looks like M^ sy For the particular triple x, h t and 8, define ir as in (|4"5"]) . 
From D1-D3 it follows that 7r is defined on B(0;p) and satisfies ||D 2 7r(0)|| < a 
and ||7r(y) - y\\ < P \\y\\ 2 for y G S(Q; p|. 

The proof is completed by showing <f> satisfies the assumptions in Lemma ETT1 
Let y G B(0;p) n T4M be arbitrary. Define y = P\R x y and note P^y = 
(y,0). From JJ1 and (gH), since ||y|| = ||y|| < p, 4> x (y) = Q- l R- l {-K[y), ht ° 
n(y)). Then ^ x (y) lies in M because R X Q X (M) locally looks like M^ ht S y That 
^(0) = follows from tt(0) = (D3) and h t (0) = 0. The facts tt(0) = 0, 
£)7r(0) = I (D3) and Dh t (0) = imply D^(0) • y = y and A^O) • y 2 = 
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R- 1 {D 2 'K{Q)-y 2 ,D 2 h t {Qyf). Thus \\D 2 (t> x {Q)-y 2 \\ < V a 2 + K 2 _\\y\\ 2 . LemmaE] 
implies \\h t (u)\\ < ±K\\u\\ 2 . Therefore, \\h t o 7r(y)|| < %K{\\*(y) - y\\ 2 + 
\\ y\\ 2 )< ±K(l + P 2 p 2 )\\y\\ 2 . Thus, \\My)-x-y\\ = ||(7r(jfl - y, h t o 7r(y))|| < 
v //3 2 + (l/4)^ 2 (l + /3 2 p 2 ) 2 ll2/H 2 - □ 

S.^. Local and Global Projections 

For an embedded manifold M c R™ +fc , it was suggested in [8] that pro- 
jection from the affine tangent plane to the manifold could be used to define 
paramctrisations. The utility of Proposition [M] is illustrated by proving such 
paramctrisations satisfy C1-C2. Only ordinary calculus is required as the dif- 
ferential geometric framework is hidden behind Proposition [34] Note the proof 
works at the generality of C 2 -smooth manifolds and is thus not based on a 
smoothness argument (Lemma [27]) • 

Lemma 35. Let h : B n (0;S) — > R fc be a C 2 -smooth map with h(0) = and 
Dh(0) = 0. Let K = sup xeB ( . s) \\D 2 h(x)\\ and p = \ min{<5, y/2/(3K)}. There 
exists a ^-smooth map ir : B n (0; p) — > B n (0; 2p) such that, for any x G B(0; p), 
{Tt{x),h o 7r(x)) is the unique point on the manifold {(t,h(t)) \ t G B(0;S)} 
closest to (a;, 0). Moreover, no point is closest to (x,0) on the smaller manifold 
{(t,h(t)) | ||t|| < ||7r(x)||}. The map tt satisfies tt(0) = 0,. Dtt(0) = /, D 2 tt(0) 
and \\tt{x) - x|| < ^K\\x\\ 2 . 

Proof. Define f(x) = x + g(x) and g(x) — (Dh(x)) T h(x) where superscript T 
denotes adjoint. Then (f(x), 0) is the unique point of intersection of the affine 
plane normal to the manifold at (x, h(x)) with the plane R™ x 0. Thus if 7r exists 
it must satisfy f(n(x)) = x. 

From LemmaCETl \\h{x)\\ < (K/2)\\x\\ 2 and \\Dh(x)\\ < K\\x\\ for x G 5(0; S). 
Therefore, \\g(x)\\ < (l/2)K 2 ||x|| 3 , \\Dg(x)\\ < (3/2)if 2 ||x|| 2 , \\Df(x) ■ £|| > 
(1 - (3/2)K 2 ||x|| 2 )||£|| and \\Df(x) - Df(y)\\ < (3/2)K 2 (\\x\\ 2 + \\y\\ 2 ) for x, y G 
B(0;6). The latter implies D 2 f(0) = 0. If x,y G B{0;2p) then || (Df(x))~ 1 \\ < 
3/2 and \\Df{x) — Df(y)\\ < 2/3 so Lemma 1 of [g, Chapter 16] and the inverse 
function theorem imply / restricted to B(0; 2p) is a C^diffeomorphism. 

Let x G B(0; p). Since the manifold includes the origin, a distance ||x|| away 
from (x, 0), the closest point(s) to (x,0) on the original manifold are the same 
as the closest point(s) on the smaller manifold {(t,h(t)) | t G B(0; 2||x||)}. The 
latter manifold is compact and hence a closest point exists. Uniqueness follows 
from / being injective on S(0; 2p) D B(0; 2||x||); any closest point (t, h(t)) must 
satisfy f(t) = x. Since this is a local condition, it also means no point is closest 
to (x,0) on {(t,h(t)) | \\t\\ < ||7r(x)||}. Note that ||7r(x)|| < 2||x||. 

The geometric bound ||7r(x) — x\\ < \\h(x)\\ implies ||7r(x) — x|| < (i"T/2)||x|| 2 , 
so 7r(0) = and Dir(0) = I. As / is a C 1 -diffcomorphism on B(0;2p), ir(x) = 
/ _1 (x) for x E B(Q;p) is C 1 -smooth and Dir = (Df o ir)~ 1 . For e > 0, choose 
<5>0such that Hj/II <2S implies ||y|| < 2||/(y)|| and ||/-D/(y)|| < e\\y\\ < \. 
(This is possible because D 2 f(0) = 0.) Then for ||x|| < min{p, 6} and y = tt(x), 
\\Dn(x)-I\\ = \\(I-(I-Df(y)))-l-I\\ < (1 - ||/- Df{y)\\)^ 1< 2e||y|| < 
4e||/(y)|| = 4e||x||, proving D 2 n{0) =0. □ 
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It is mentioned tangentially that the cubic bound \\tt(x) — x|| = ||/(7r(a;)) — 
7r(x)|| < 4i-L 2 ||x|| 3 is readily obtainable from the above proof. 

Given a rule such as (jUJ) , a parametrisation of a manifold can be obtained 
by using (l4"5j) in reverse. The only technicality is the choice of neighbourhood 
size 5 to use for each point on the manifold. The actual choice is generally 
not important provided a positive uniform lower bound exists on any compact 
neighbourhood. In fact, as is the case in the following lemma, the choice may 
depend on y as well as x in (|4"5j). 

Proposition 36. Let M C M. n+k be an embedded C 2 -manifold of dimension 
n. Adopting the notation of Section Iff.ffl associate with each x G M and y G 
R™ a 5 xy > 0, an h xy G fis^y an d a rotation R x such that: 1) R X Q X (M) 
locally looks like M( h:c g x \; and, 2) if no point of Mi^ g a \ is closest to (y,0) 
then the same is true for any admissible choice of 5 xy . Referring to J^5[ ) and 
\44% if {t,h xy (t)) G ^-{hx ,5 X ) * s ^ e unique closest point to (y, 0) then set 
4>(x, i?" 1 Jy) = Q~ 1 R~ 1 (t,h xy (t)). Otherwise, if the closest point does not 
exist or is not unique, let <p(x, i?" 1 Jy) be an arbitrary element of M. Then the 
parametrisation <p : TM — ¥ M , (f> = <f> o tj, satisfies condition D. 

Proof. The k = case is straightforward so assume k > 0. Define the functions 
p(S, K) = \ min{<S, y/2/(3K)}, a(S, K) = and /3(5, K) = K/2; they satisfy D4- 
D6. Next, choose x G M, 5 > and h G S)$ such that R X Q X (M) locally looks like 
M (M) . Let K = su PteB(M) \\D 2 h{t)\\. Then for y G B(0;p(6,K)) define n(y) 

as in Lemma [35l the unique closest point to (y, 0) on Mr^ g) is (""(y), h o ir(y)). 
This must therefore correspond with the it in (|45|) . That D1-D3 hold follows 
immediately from Lemma [331 □ 

To assist in interpreting Proposition 1361 consider how the projection from 
(y, 0) onto Mr^n changes with S. If 5 is too small then no point is necessarily 
closest because Mr^ g) is n ot compact. As 6 increases the closest point may 
change as more candidates become available. The advantage of Proposition [36] 
in practice is it allows parametrisations to be defined using only local minima 
of the Euclidean distance function rather than insisting on global minima. Note 
too that projecting onto is different from projecting onto M because M 

may curve around and come close to touching itself. 

Proposition 37. Let M C M. n+k be an embedded n-dimensional C 2 -manifold. 
Let a : TM — > M. n+k be the map taking a tangent vector to its equivalent point 
on the affine tangent plane. Let (j) : TM — > M be any map with the property 
that \\<f>(vp) — a(vp)\\ = min q< =]\j \\q — a(v p )\\ whenever the minimum exists, where 
the norm is the Euclidean norm on M n+fc . Then (j) oxid ip — <f> satisfy CI and 
C2 of Section^ 

Proof. The k = case is straightforward so assume k > 0. For z G M, define 
p{z) = sup{p | 3h G Sj p ,3 an open U D B n+k (0; p), R Z Q Z {M) D U = M {h , P )h 
see Section I8.3I for notation. Let K C M be a compact set and assume to 
the contrary there exists a convergent sequence Zi — ¥ z in K with p(z{) — > 0. 
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It follows from the proof of Proposition [Ml that at z there exist a S > and 
an h G Sjes such that (by shrinking S if necessary) M(hM) — RzQz(M) H J7, 
{/ D i?„+fe(0;6J) and for any z g £?(z;i5) n M, there exists an /i z G Sjs 
such that M ( ^ j(5) = R Z Q Z {M) n R z Q z Qi 1 RJ 1 {U) n Pf 1 (i?„(0; J)). Since 
S„ +fe (0;<5) C i^Q^Q^-RJ^-Bn+fc^eJ)) it follows that p(z) > pfor ||z-z|| < p, 
a contradiction. Thus inf ze x > 0. 

For x £ M and y £ S„(0; p{x)/A) dchnc 5 xy = p(x)/2 and (j)(x,R x 1 Jy) = 
4>o Tf 1 (x,R~ 1 Jy). The closest point to (y, 0) on R X Q X (M) must be contained 
in _B„ + fe(0; p(x)/2) and hence is in the local representation ,s x )■ It is 

therefore possible to define S xy and (j>(x, R' 1 Jy) for y £ B n (0; p(x) / 4) so that 
4> satisfies the conditions of Proposition [35] (and hence by Proposition [M] the 
corresponding parametrisation satisfies CI and C2). By Lemma [501 because 
4>{ v x) — 4>° T i( v x) whenever \\v x \\ < p(x)/4, (f> and ip satisfy CI and C2. □ 

Since any manifold can be embedded in M. n+k for sufficiently large k, Propo- 
sition [37] guarantees the existence of parametrisations satisfying C1-C2. 

8.5. Discussion 

To the best of our knowledge, all Newton-like methods on finite-dimensional 
manifolds in the literature can be rewritten as (|39p where the parametrisations 
4> and ip are smooth. Theorem [24] and Lemma [27] together imply that such 
Newton-like methods have local quadratic convergence. As a specific example, 
the original Riemannian Newton method in [4} uses the Riemannian exponential 
map for the parametrisations <f> and ip; cf. @. It is a standard result that if 
M is C 4 -smooth then Exp is C 2 -smooth on a neighbourhood of the zero section, 
and moreover, Exp p (0) = p and DExp p (0) = /. Therefore, the Riemannian 
Newton method ([2]) has local quadratic convergence by Remark [28] 

In addition to introducing the general framework ([3]) , the article [8J] applied 
the framework to the (real and complex) Grassmann and Stiefel manifolds, with 
parametrisations chosen to be global projections from Euclidean space onto the 
affinc tangent planes of the Stiefel manifold, and an analogous choice made for 
the Grassmann manifold by treating it as a quotient space of the Stiefel manifold. 
Local quadratic convergence follows from Proposition [37] and Section 18.11 

When sufficient smoothness is not present for Lemma[57Jto be applicable, the 
proofs in Section [5] demonstrate that essentially all the effort goes into obtaining 
uniform bounds. Condition D in Section [8.31 is one illustration of this. 

The conjecture made in Section [4J] applies to Newton- like methods on man- 
ifolds too. It is difficult to see how any iterative scheme can fail to be of the 
form (|39p if it uses only the information in the two-jet of / about the current 
point to converge locally quadratically to a non-degenerate critical point for a 
sufficiently rich class of functions /. 

An advantage of expressing an algorithm in the form ([39]) is that it gives the 
algorithm the following heuristic interpretation: at each step, the parametrisa- 
tion </) endeavours to make fo<j) look as quadratic as possible, while ip endeavours 
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to map the result back to the manifold as cheaply as possible; cf. Section 1431 
Additionally, the fundamental idea of re-centring can further simplify matters. 

Finally, it is remarked that affine connections and parallel transport can be 
used to construct parametrisations. This can be understood in terms of the 
classical notion of development in differential geometry — a manifold M can 
be rolled along an affine space without slipping — which shows that this is a 
representative example of the re-centring technique in Section 18.31 

9. Iterates Computing Coordinate Independent Properties 

This section studies how iteration functions besides the Newton iteration 
function can be lifted from Euclidean space to manifolds. This necessitates 
introducing a rudimentary theory of iterative methods computing coordinate 
independent properties. It also studies further the generalised Newton method 
at a grass-roots level. 

First, the concept of converging to an identifiable point of / needs defining. 
Example 1401 may prove illuminative. 

Definition 38. Assign to each / : Dom/ c R™ — > K a subset Pf C IntDom/ 
of the interior of the domain of /. The property P = {Pf} is C k -coordinate 
independent if x € Pf implies 4>~ 1 (x) £ Pfo<f> for every C fe -diffcomorphism cf> of 
open sets in R™ with x in the image of </>. 

Henceforth, / : R™ -> R will mean / : Dom /cIMl with an implicit 
requirement that a particular point be in the domain of / whenever necessary. 
For example, / being C fc -smooth at x implicitly requires x € Dom /. 

Two functions /, g : R™ — > R are fc-jet equivalent at p £ R™ if / and g 
are C fc -smooth in a neighbourhood of p and f(p) — g(p), Df(p) = Dg(p), • • • , 
D k f(p)=D k g(p). 

Definition 39. A kth-order iterative method is the assignment of an iteration 
function N f : R™ -> R™ to each / : R" -> R where N f (p) = N g (p) whenever 
/ and g are fc-jet equivalent at p. An iterative method computes the property 
P = {Pf} with rate K if, for any given / and x* £ Pf, the iterate Nf converges 
locally to x* with rate K . 

Example 40. Let Pf be the set of points x such that / is C 3 -smooth in a 
neighbourhood of x, and a; is a non-degenerate critical point of /. Then P 
is C 3 -coordinate independent and the Newton iterate Nf in ([1} is a 2nd-order 
iterative method that computes P with rate 2. 

Example 41. In Example l40l Pf can be instead the set of points x* satis- 
fying the conditions in Theorem [4j including ([5]). Then P is a C 2 -coordinate 
independent property (see Remark [7]) computed by the Newton iterate ([T]). 

Remark 42. It follows from Definition [38] by using the identity map (f> : U — > U 
that if P is a C fc -coordinate independent property, U is an open subset of R™ and 
g = f\u is the restriction to U of a function / : R" — > R then P f n U C P g . The 
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converse is not implied; properties can be "forgotten" as the domain increases. 
Stricter definitions are not necessary for what follows. 

Properties of functions in R™ lift to properties of functions on manifolds. 

Definition 43. Let M be a manifold with maximal atlas A of C fc -smooth charts 
(U,ip). Let P = {Pf} be a C k -coordinate independent property with k' < k. 
For any / : M ->_R, define P f = {p G M \ B(U, ip) G A, p G U, ip(p) G P fov -i}. 
The elements of Pf are said to have property P. 

Lemma 44. Let P = {Pf} be a C k - coordinate independent property with k > 1. 
Let N be an iterative method computing P with rate K > 1 . Fix an f : R" — > R 
and x* G Pf. Let <fi be a C k -diffeomorphism of open subsets o/R™ whose image 
contains x* . Then the iteration function 

N f ^cjjoNfo^ocj)- 1 (46) 

converges locally with rate K to x* . 

Proof. Since <t)~ 1 (x*) G Pfo<p, Nf 0< p converges locally to <f)~ l (x*) with rate K. 
Furthermore, cj> is bi-Lipschitz about (j>^ 1 (x*) in a suitably small neighbourhood 
of because k > 1. Lemma I^TI completes the proof. □ 

Lemma 04] suggests the coordinate- adapted viewpoint used to generalise the 
Newton method to manifolds may prove beneficial in more general contexts. 
The remainder of this section elicits this idea. 

The convergence proofs for the coordinate adapted Newton method in Sec- 
tion |4] and the generalised Newton method in Section |6] relied on essentially just 
two properties of the Newton iterate: invariance to affine coordinate changes, 
and a lower bound on the radius of convergence of the Newton iterate Nf 0( f, in 
terms of the second-order behaviour of <f). Here, radius of convergence refers to 
p in Section [2j 

Affine invariance was exploited partially for convenience — it meant only 
parametrisations 4> p with (j) p (0 p ) = p and D<j) p (Q p ) — L were needed — and 
partially to allow the Newton iterate ([T]) to be applied unambiguously to the 
abstract vector space T p M. The reason for using tangent spaces was again for 
convenience. It made it easier to exploit smoothness when possible. Section [7] 
discussed this in detail. 

Affine invariance was also used for re-centring. In (|27| . the change of coor- 
dinate transformations <p x do not change the point x, that is, 4> x {x) = x. This 
is perhaps the most natural choice for 4> x {x) as it does not shift the space un- 
necessarily. When lifting an iteration function to a manifold, such a choice is no 
longer possible. The proposed solution was to choose a distinguished point of 
K™, the origin, and always apply the Newton iteration function at this distin- 
guished point; cf. ((3]) and (|39|) . The invariance of the Newton method to shifts 
made this inconsequential. 

If the iterative method Nf is not shift-invariant then re-centring it at each 
iteration may alter its behaviour. It is therefore necessary to study the re- 
centred iterate Xk+i — Xk o Nf g sck (0) where 9 x {y) = x + y. Equivalently, N 
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can be replaced by its re-centred version N/(x) — 9 X o Nf O g x (0) which is shift- 
invariant: Z o Nf g z o 6~ 1 (x) = Nf(x). Henceforth the iterative method Nf is 
assumed to be shift-invariant. 

If the iterative method Nf is afhne-invariant then smooth parametrisations 
<f> : TM — > M can be used to lift Nt to manifolds in the same way the Newton 
method was lifted. Otherwise, parametrisations from M x R ra rather than TM 
need be considered if a global approach is taken. A simpler and more general 
alternative is to construct parametrisations locally, as in Section leading to 
path-dependent lifts of Nf. 

Henceforth, a local viewpoint is adopted because determining how to make 
parametrisations constructed locally fit together globally is a topological prob- 
lem unrelated to local convergence properties and which needs no addressing if 
a path-dependent lift is adequate. 

In one sense, lifting a shift-invariant iterative method to a manifold locally 
about a point is straightforward. 

Proposition 45. As in Definition\43[ let M be a C k -smooth manifold, let (U, ip) 
be a chart on M , and let P be the lift to M of a C k -coordinate independent 
property P, with k' < k. Let N an iterative method of order at most k that 
computes P with rate K. Define Ef(p) = ip^ 1 o o Nf 0ip -i g (0) for 

p £ U , where 9 x (y) = x + y. Then Ef computes P on U with rate K , meaning 
for any cost function f : M —> M. and any p* £ U PI Pf, the iteration function 
Ef converges locally with rate K top* fDefinition \23[) . 

Proof. Let (V, rp) be such that tp(p*) G Pfoip- 1 - Shrink U if necessary so that 
p* € U C V. Since P is coordinate invariant, the diffeomorphism ip o tp^ 1 can 
be used to show <p(p*) £ Pf 0ip -i. By Definition [23l it suffices to study N(x) — 
ipo E f o(p- 1 (x) = 9 x oN(f oip -i) o e x (0) = 6 x oN(f 09 -i} oex o6~ 1 (x) = Ny ov -i)(x) 
where the last equality follows from shift-invariance. Since x* = <p(p*) £ Pfotp- 1 
and N computes P, N converges locally to x* with rate K, as required. □ 

Allowing more flexibility than afforded by Proposition|?S]is desirable for two 
reasons: Section 14.31 explained how customised parametrisations can improve 
performance for certain classes of cost functions, and Section [8] gave techniques 
for adapting parametrisations to geometric features of the manifold. 

The most general way found for lifting a Newton method to a manifold 
is (|3"§1) . Furthermore, the use of tp in is an add-on: if Ef(p) — (p p o 
Nf 0c f >p (0 P ) converges then (f39|) will also converge with the same rate provided ip 
is a sufficiently good approximation to <p. All that remains then is to understand 
when Ef(p) = <p v o Nf Ot p p (0) computes P with rate K given that N computes 
P with rate K, as in Proposition 1431 Note that here, cj) is defined on U x M." 
where U is an open subset of M, and </> p (0) = p. It is also necessary for 4> p to 
be a local diffeomorphism about 0, that is, a genuine change of coordinates. 

As in the proof of Proposition [45l consider N(x) = ipo Ef o ip^ 1 (x). Using 
the fact that (ip o (p v -i( x ) o 9- x )(x) = x for all x, this becomes 

N(x) =o- x oNf oax oa x 1 (x), a x = <po(f) v -i (x) o9- x , f = f o ip^ 1 . (47) 
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Assume Nj converges with rate K to x* . Then N(x*) — x* . The trick for 
seeing how N converges locally to x* is to use (|4T>1) to remove the coordinate 
change a x - from (f47|) . Precisely, the iterative method N g in (|47|) is replaced by 
the iterative method N = o x * o N g0(Ta ., o ct". 1 . By Lemma |4"41 this change will 
not alter the rate K of convergence provided if > 1. (Recall from Section [2] 
that the K = 1 case is more delicate.) Thus, 

N(x) =ip x oN^oip- 1 (x) ) tl> x = <p°<f> v -i(x)° ^.-.o^yjof 1 , (48) 

As arranged, i/^* is the identity. If AT, and hence N, is reasonably nice then the 
radius of convergence — equivalently, the constant K in Section [2] — associated 
with (|48p should remain bounded if ip x remains sufficiently close to the identity. 
Indeed, all (|4"5]) is doing is applying TV to the cost function / in the coordinate 
system determined by ip x . At the end of the day, lifting iterative methods to 
manifolds relies on this one simple principle: that the iterative method be robust 
to changes of coordinates. 

10. Conclusion 

The Newton method (p} traditionally lifts to manifolds by endowing the 
manifold with a Riemannian structure and using ([2]). Although this strategy 
has merit, it provides limited insight, and may have a high computational cost 
when implemented. This motivates the study, from first principles, of lifting 
iterative methods from Euclidean space to manifolds. 

The main contribution is identifying the pivotal role played by coordinate 
changes. Changing coordinates at each iteration is a novel yet easily understood 
and applied technique for enhancing the performance of iterative methods in 
Euclidean space (Section S3]). Robustness to coordinate changes is key to lifting 
iterative methods to manifolds in useful ways (Section ^ . 

Newton-like methods on manifolds are defined customarily as iteration func- 
tions Ef : M —> M. This is unnecessarily restrictive; allowing Ef to depend on 
past history leads to path-dependent Newton methods (Section]?]), and a change 
in focus from devising parametrisations to devising transformations (Section [8]). 
The simplifications stemming from this generalisation are a consequence of elim- 
inating the need for local lifts to agree globally; global agreement is a topological 
problem with little bearing on the computational problem of iteratively finding 
a critical point. While smooth global lifts of the Newton method always exist, 
global agreement may not be possible if the iterative method is not invariant to 
affinc transformations and the manifold is non-parallelisable (Section [7]) . 

The proposed framework for lifting the Newton method to manifolds is per- 
haps the most general one possible (Section [6]) : the condition on the cost func- 
tion in Theorem [24] agrees with the necessary and sufficient condition in The- 
orem |4] for the Euclidean case, and it is difficult to see how conditions C1-C2 
can be weakened (cf. Section |4"7T]) . 
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